AITopics

Country:

North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Neural Information Processing SystemsFeb-9-2026, 22:00:48 GMT

76df3f555683bc6c4b988bb81b930d5b-Paper-Conference.pdf

architecture, arxiv preprint arxiv, h-meta-na, (12 more...)

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-9-2026, 00:08:08 GMT

BRP-NAS: Prediction-based NAS using GCNs

Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks.

artificial intelligence, machine learning, predictor, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.92)

Neural Information Processing SystemsOct-3-2025, 07:01:21 GMT

BRP-NAS: Prediction-based NAS using GCNs

Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks.

architecture search, latency predictor, predictor, (13 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.92)

Neural Information Processing SystemsAug-18-2025, 04:33:47 GMT

HELP: Hardware-Adaptive Efficient Latency Prediction for NAS via Meta-Learning

NAS collect a large number of samples (e.g., accuracy and latency) from a target

artificial intelligence, machine learning, predictor, (15 more...)

Country:

North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Neural Information Processing SystemsAug-16-2025, 01:32:48 GMT

76df3f555683bc6c4b988bb81b930d5b-Paper-Conference.pdf

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nasir, Azaz-Ur-Rehman, Shoaib, Samroz Ahmad, Hanif, Muhammad Abdullah, Shafique, Muhammad

ESM: A Framework for Building Effective Surrogate Models for Hardware-Aware Neural Architecture Search

arXiv.org Artificial IntelligenceAug-5-2025

Hardware-aware Neural Architecture Search (NAS) is one of the most promising techniques for designing efficient Deep Neural Networks (DNNs) for resource-constrained devices. Surrogate models play a crucial role in hardware-aware NAS as they enable efficient prediction of performance characteristics (e.g., inference latency and energy consumption) of different candidate models on the target hardware device. In this paper, we focus on building hardware-aware latency prediction models. We study different types of surrogate models and highlight their strengths and weaknesses. We perform a systematic analysis to understand the impact of different factors that can influence the prediction accuracy of these models, aiming to assess the importance of each stage involved in the model designing process and identify methods and policies necessary for designing/training an effective estimation model, specifically for GPU-powered devices. Based on the insights gained from the analysis, we present a holistic framework that enables reliable dataset generation and efficient model generation, considering the overall costs of different stages of the model generation pipeline.

artificial intelligence, machine learning, predictor, (16 more...)

2508.01505

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceFeb-9-2025

HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location

Sun, Ting, Wang, Penghan, Lai, Fan

Large language models (LLMs) have facilitated a wide range of applications with distinct service-level objectives (SLOs), from latency-sensitive online tasks like interactive chatbots to throughput-oriented offline workloads like document summarization. The existing deployment model, which dedicates machines to each workload, simplifies SLO management but often leads to poor resource utilization. This paper introduces HyGen, an interference-aware LLM serving system that enables efficient co-location of online and offline workloads while preserving latency requirements. HyGen incorporates two key innovations: (1) performance control mechanisms, including a latency predictor to estimate batch execution time and an SLO-aware profiler to quantify latency interference, and (2) SLO-aware offline scheduling policies that maximize serving throughput and prevent starvation, without compromising online serving latency. Our evaluation on production workloads shows that HyGen achieves up to 3.87x overall throughput and 5.84x offline throughput gains over online and hybrid serving baselines, respectively, while strictly satisfying latency SLOs.

large language model, machine learning, throughput, (19 more...)

2501.14808

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Akhauri, Yash, Abdelfattah, Mohamed S.

On Latency Predictors for Neural Architecture Search

arXiv.org Artificial IntelligenceMar-4-2024

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate NN architecture. Recent research has shown that the sample efficiency of these predictive models can be greatly improved through pre-training on some training devices with many samples, and then transferring the predictor on the test (target) device. Transfer learning and meta-learning methods have been used for this, but often exhibit significant performance variability. Additionally, the evaluation of existing latency predictors has been largely done on hand-crafted training/test device sets, making it difficult to ascertain design features that compose a robust and general latency predictor. To address these issues, we introduce a comprehensive suite of latency prediction tasks obtained in a principled way through automated partitioning of hardware device sets. We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes. Building on conclusions from our study, we present an end-to-end latency predictor training strategy that outperforms existing methods on 11 out of 12 difficult latency prediction tasks, improving latency prediction by 22.5% on average, and up to to 87.6% on the hardest tasks. Focusing on latency prediction, our HW-Aware NAS reports a 5.8 speedup in wall-clock time. NN architecture variants (Elsken, 2023), making it impossible to perform hardware-aware NN optimizations. For With recent advancements in deep learning (DL), neural these reasons, much research has investigated statistical latency networks (NN) have become ubiquitous, serving a wide predictors that can accurately model hardware device array of tasks in different deployment scenarios.

architecture, latency predictor, predictor, (10 more...)

2403.02446

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Akhauri, Yash, Abdelfattah, Mohamed S.

Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

arXiv.org Artificial IntelligenceJun-4-2023

Many hardware-aware neural architecture search (NAS) methods have been developed to optimize the topology of neural networks (NN) with the joint objectives of higher accuracy and lower latency. Recently, both accuracy and latency predictors have been used in NAS with great success, achieving high sample efficiency and accurate modeling of hardware (HW) device latency respectively. However, a new accuracy predictor needs to be trained for every new NAS search space or NN task, and a new latency predictor needs to be additionally trained for every new HW device. In this paper, we explore methods to enable multi-task, multi-search-space, and multi-HW adaptation of accuracy and latency predictors to reduce the cost of NAS. We introduce a novel search-space independent NN encoding based on zero-cost proxies that achieves sample-efficient prediction on multiple tasks and NAS search spaces, improving the end-to-end sample efficiency of latency and accuracy predictors by over an order of magnitude in multiple scenarios. For example, our NN encoding enables multi-search-space transfer of latency predictors from NASBench-201 to FBNet (and vice-versa) in under 85 HW measurements, a 400$\times$ improvement in sample efficiency compared to a recent meta-learning approach. Our method also improves the total sample efficiency of accuracy predictors by over an order of magnitude. Finally, we demonstrate the effectiveness of our method for multi-search-space and multi-task accuracy prediction on 28 NAS search spaces and tasks.

artificial intelligence, machine learning, predictor, (14 more...)

2306.02459

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)